Towards Semantic Annotation of Bioinformatics Services: Building a Controlled Vocabulary

نویسندگان

Hammad Afzal

Robert Stevens

Goran Nenadic

چکیده

Most bio text-mining efforts so far have focused on identification of biological, molecular and chemical entities from the literature to support knowledge acquisition and discovery in the life sciences. There are also a growing number of bioinformatics services and tools available. This raises the challenging problem of semi-automated annotation, documentation and discovery of services suitable for a specific data analysis and/or integration into workflows. The first step in this process would be to build a controlled vocabulary to describe bioinformatics services, which can then be used for service retrieval and discovery. In this paper we present a methodology that combines lexical and contextual profiles of candidate terms to suggest terms for the bioinformatics vocabulary. The method achieved an estimated precision in the range 70-90% with recall between 20 and 90%. After processing the whole of BMC Bioinformatics, almost 80% of the top 300 terms were deemed as conceptual terms relevant for describing the major concepts in bioinformatics. In addition to this, the method has also extracted a number of service and tool names. The controlled vocabulary is freely available at: http://gnode1.mib.man.ac.uk/bioinf/CV.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BIM: an open ontology for the annotation of biomedical images

Biomedical images published within the scientific literature play a central role in reporting and facilitating life science discoveries. Existing ontologies and vocabularies describing biomedical imag-‐ es, particularly sequence images, do not provide sufficient seman-‐ tic representation ...

متن کامل

Identifying informative subsets of the Gene Ontology with information bottleneck methods

MOTIVATION The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable f...

متن کامل

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...

متن کامل

Ontology for immunogenetics: the IMGT-ONTOLOGY

MOTIVATION IMGT, the international ImMunoGeneTics database (http:@imgt.cines.fr:8104), created by M.-P. Lefranc, is an integrated database specializing in antigen receptors (immunoglobulins and T-cell receptors) and major histocompatibility complex (MHC) of all vertebrate species. IMGT accurate immunogenetics data are based on the standardization of the biological knowledge provided by the 'ImM...

متن کامل

caCORE: A common infrastructure for cancer informatics

MOTIVATION Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Towards Semantic Annotation of Bioinformatics Services: Building a Controlled Vocabulary

نویسندگان

چکیده

منابع مشابه

BIM: an open ontology for the annotation of biomedical images

Identifying informative subsets of the Gene Ontology with information bottleneck methods

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

Ontology for immunogenetics: the IMGT-ONTOLOGY

caCORE: A common infrastructure for cancer informatics

عنوان ژورنال:

اشتراک گذاری